Tamil Nadu
7dfcaf4512bbf2a807a783b90afb6c09-Paper-Datasets_and_Benchmarks_Track.pdf
Recent advancements in text-to-speech (TTS) synthesis show that large-scale models trained with extensive web data produce highly natural-sounding output. However, such data is scarce for Indian languages due to the lack of high-quality, manually subtitled data on platforms like LibriVox or YouTube. To address this gap, we enhance existing large-scale ASR datasets containing natural conversations collected in low-quality environments to generate high-quality TTS training data. Our pipeline leverages the cross-lingual generalization of denoising and speech enhancement models trained on English and applied to Indian languages. This results in IndicVoices-R (IV-R), the largest multilingual Indian TTS dataset derived from an ASR dataset, with 1,704 hours of high-quality speech from 10,496 speakers across 22 Indian languages.
From Linear to Spline-Based Classification:Developing and Enhancing SMPA for Noisy Non-Linear Datasets
Building upon the concepts and mechanisms used for the development in Moving Points Algorithm, we will now explore how non linear decision boundaries can be developed for classification tasks. First we will look at the classification performance of MPA and some minor developments in the original algorithm. We then discuss the concepts behind using cubic splines for classification with a similar learning mechanism and finally analyze training results on synthetic datasets with known properties.
VocalEyes: Enhancing Environmental Perception for the Visually Impaired through Vision-Language Models and Distance-Aware Object Detection
Chavan, Kunal, Balaji, Keertan, Barigidad, Spoorti, Chiluveru, Samba Raju
With an increasing demand for assistive technologies that promote the independence and mobility of visually impaired people, this study suggests an innovative real-time system that gives audio descriptions of a user's surroundings to improve situational awareness. The system acquires live video input and processes it with a quantized and fine-tuned Florence-2 big model, adjusted to 4-bit accuracy for efficient operation on low-power edge devices such as the NVIDIA Jetson Orin Nano. By transforming the video signal into frames with a 5-frame latency, the model provides rapid and contextually pertinent descriptions of objects, pedestrians, and barriers, together with their estimated distances. The system employs Parler TTS Mini, a lightweight and adaptable Text-to-Speech (TTS) solution, for efficient audio feedback. It accommodates 34 distinct speaker types and enables customization of speech tone, pace, and style to suit user requirements. This study examines the quantization and fine-tuning techniques utilized to modify the Florence-2 model for this application, illustrating how the integration of a compact model architecture with a versatile TTS component improves real-time performance and user experience. The proposed system is assessed based on its accuracy, efficiency, and usefulness, providing a viable option to aid vision-impaired users in navigating their surroundings securely and successfully.
On the Development of Binary Classification Algorithm Based on Principles of Geometry and Statistical Inference
The aim of this paper is to investigate an attempt to build a binary classification algorithm using principles of geometry such as vectors, planes, and vector algebra. The basic idea behind the proposed algorithm is that a hyperplane can be used to completely separate a given set of data points mapped to n dimensional space, if the given data points are linearly separable in the n dimensions. Since points are the foundational elements of any geometrical construct, by manipulating the position of points used for the construction of a given hyperplane, the position of the hyperplane itself can be manipulated. The paper includes testing data against other classifiers on a variety of standard machine learning datasets. With a focus on support vector machines, since they and our proposed classifier use the same geometrical construct of hyperplane, and the versatility of SVMs make them a good bench mark for comparison. Since the algorithm focuses on moving the points through the hyperspace to which the dataset has been mapped, it has been dubbed as moving points algorithm.
A Comprehensive Insights into Drones: History, Classification, Architecture, Navigation, Applications, Challenges, and Future Trends
Singh, Ruchita, Kumar, Sandeep
Unmanned Aerial Vehicles (UAVs), commonly known as Drones, are one of 21st century most transformative technologies. Emerging first for military use, advancements in materials, electronics, and software have catapulted drones into multipurpose tools for a wide range of industries. In this paper, we have covered the history, taxonomy, architecture, navigation systems and branched activities for the same. It explores important future trends like autonomous navigation, AI integration, and obstacle avoidance systems, emphasizing how they contribute to improving the efficiency and versatility of drones. It also looks at the major challenges like technical, environmental, economic, regulatory and ethical, that limit the actual take-up of drones, as well as trends that are likely to mitigate these obstacles in the future. This work offers a structured synthesis of existing studies and perspectives that enable insights about how drones will transform agriculture, logistics, healthcare, disaster management, and other areas, while also identifying new opportunities for innovation and development.
RNN-Based Models for Predicting Seizure Onset in Epileptic Patients
Mounagurusamy, Mathan Kumar, S, Thiyagarajan V, Rahman, Abdur, Chandak, Shravan, Balaji, D., Jallepalli, Venkateswara Rao
Early management and better clinical outcomes for epileptic patients depend on seizure prediction. The accuracy and false alarm rates of existing systems are often compromised by their dependence on static thresholds and basic Electroencephalogram (EEG) properties. A novel Recurrent Neural Network (RNN)-based method for seizure start prediction is proposed in the article to overcome these limitations. As opposed to conventional techniques, the proposed system makes use of Long Short-Term Memory (LSTM) networks to extract temporal correlations from unprocessed EEG data. It enables the system to adapt dynamically to the unique EEG patterns of each patient, improving prediction accuracy. The methodology of the system comprises thorough data collecting, preprocessing, and LSTM-based feature extraction. Annotated EEG datasets are then used for model training and validation. Results show a considerable reduction in false alarm rates (average of 6.8%) and an improvement in prediction accuracy (90.2% sensitivity, 88.9% specificity, and AUC-ROC of 93). Additionally, computational efficiency is significantly higher than that of existing systems (12 ms processing time, 45 MB memory consumption). About improving seizure prediction reliability, these results demonstrate the effectiveness of the proposed RNN-based strategy, opening up possibilities for its practical application to improve epilepsy treatment.
Transformers with Sparse Attention for Granger Causality
Mahesh, Riya, Vashisht, Rahul, Lakshminarayanan, Chandrashekar
Temporal causal analysis means understanding the underlying causes behind observed variables over time. Deep learning based methods such as transformers are increasingly used to capture temporal dynamics and causal relationships beyond mere correlations. Recent works suggest self-attention weights of transformers as a useful indicator of causal links. We leverage this to propose a novel modification to the self-attention module to establish causal links between the variables of multivariate time-series data with varying lag dependencies. Our Sparse Attention Transformer captures causal relationships using a two-fold approach - performing temporal attention first followed by attention between the variables across the time steps masking them individually to compute Granger Causality indices. The key novelty in our approach is the ability of the model to assert importance and pick the most significant past time instances for its prediction task against manually feeding a fixed time lag value. We demonstrate the effectiveness of our approach via extensive experimentation on several synthetic benchmark datasets. Furthermore, we compare the performance of our model with the traditional Vector Autoregression based Granger Causality method that assumes fixed lag length.
A Novel Adaptive Hybrid Focal-Entropy Loss for Enhancing Diabetic Retinopathy Detection Using Convolutional Neural Networks
V, Pandiyaraju, Malarvannan, Santhosh, Venkatraman, Shravan, A, Abeshek, B, Priyadarshini, A, Kannan
Diabetes related complications global The application of Artificial Intelligence (AI) in scale and lead to substantial vision loss and in technology in the field of medical imaging has many cases blindness [2]. Millions of people across the globe are affected by DR and this adds sweeping Despite the advancement in diagnostic methods, challenges to healthcare systems especially in areas most of these still necessitate manual assessment, with a rising diabetic population base. Gary et al. which is inefficient, subjective and inconsistent in (2017) further pointed out the flaws in traditional quality. This is quite worrying in countries where the manual detection and treatment methods for DR, incidence of diabetes is high like in the Khyber arguing that such approaches are not only costly but Pakhtunkhwa province of Pakistan, where around time consuming and usually involve an element of 30% of the population is diabetic and 4% of human mistake. For this reason, Convolutional blindness cases are attributed to DR. As manual Neural Networks (CNNs) have become very helpful approaches fail to meet the growing demand, in automating DR diagnosis because they drastically automated strategies using deep learning techniques, enhance the diagnostic accuracy [3]. The network such as Convolutional Neural Networks (CNNs) scans the retina for the presence of microaneurysms, seem to be an efficient and more scalable solution hemorrhages, exudates, and so forth.
A Comprehensive Survey of Mamba Architectures for Medical Image Analysis: Classification, Segmentation, Restoration and Beyond
Bansal, Shubhi, A, Sreeharish, J, Madhava Prasath, S, Manikandan, Madisetty, Sreekanth, Rehman, Mohammad Zia Ur, Raghaw, Chandravardhan Singh, Duggal, Gaurav, Kumar, Nagendra
Mamba, a special case of the State Space Model, is gaining popularity as an alternative to template-based deep learning approaches in medical image analysis. While transformers are powerful architectures, they have drawbacks, including quadratic computational complexity and an inability to address long-range dependencies efficiently. This limitation affects the analysis of large and complex datasets in medical imaging, where there are many spatial and temporal relationships. In contrast, Mamba offers benefits that make it well-suited for medical image analysis. It has linear time complexity, which is a significant improvement over transformers. Mamba processes longer sequences without attention mechanisms, enabling faster inference and requiring less memory. Mamba also demonstrates strong performance in merging multimodal data, improving diagnosis accuracy and patient outcomes. The organization of this paper allows readers to appreciate the capabilities of Mamba in medical imaging step by step. We begin by defining core concepts of SSMs and models, including S4, S5, and S6, followed by an exploration of Mamba architectures such as pure Mamba, U-Net variants, and hybrid models with convolutional neural networks, transformers, and Graph Neural Networks. We also cover Mamba optimizations, techniques and adaptations, scanning, datasets, applications, experimental results, and conclude with its challenges and future directions in medical imaging. This review aims to demonstrate the transformative potential of Mamba in overcoming existing barriers within medical imaging while paving the way for innovative advancements in the field. A comprehensive list of Mamba architectures applied in the medical field, reviewed in this work, is available at Github.
A Channel Attention-Driven Hybrid CNN Framework for Paddy Leaf Disease Detection
V, Pandiyaraju, Venkatraman, Shravan, A, Abeshek, S, Pavan Kumar, A, Aravintakshan S, M, Senthil Kumar A, A, Kannan
Farmers face various challenges when it comes to identifying diseases in rice leaves during their early stages of growth, which is a major reason for poor produce. Therefore, early and accurate disease identification is important in agriculture to avoid crop loss and improve cultivation. In this research, we propose a novel hybrid deep learning (DL) classifier designed by extending the Squeeze-and-Excitation network architecture with a channel attention mechanism and the Swish ReLU activation function. The channel attention mechanism in our proposed model identifies the most important feature channels required for classification during feature extraction and selection. The dying ReLU problem is mitigated by utilizing the Swish ReLU activation function, and the Squeeze-andExcitation blocks improve information propagation and cross-channel interaction. Upon evaluation, our model achieved a high F1-score of 99.76% and an accuracy of 99.74%, surpassing the performance of existing models. These outcomes demonstrate the potential of state-of-the-art DL techniques in agriculture, contributing to the advancement of more efficient and reliable disease detection systems.